feat: enhance subscription management with metrics and reconnection #621

thlorenz · 2025-11-06T21:51:28Z

Summary by CodeRabbit

New Features
- Subscription metrics for monitored accounts and new account-fetch success/failure counters
- Automatic reconnection with health verification for pub/sub connections
Bug Fixes
- More consistent account eviction and subscription handling
- Clearer undelegation and fetch-related logging and messages
Improvements
- Refined account provider configuration (metrics toggle)
- Improved logging with connection identifiers and dependency updates

Enhances subscription management and reliability by adding metrics, and robust reconnection logic with automatic resubscription.

Details

Subscription Management Improvements

Atomic Subscribe/Unsubscribe: Improved subscribe logic to handle evictions and rollbacks atomically, preventing race conditions between LRU cache and pubsub client
Enhanced Error Handling: Better error propagation and logging for subscription failures

Metrics and Monitoring

Subscription Metrics: Added subscription_count method to ChainPubsubClient trait for tracking active subscriptions across all clients
Monitored Accounts Gauge: Updated metrics to accurately report monitored accounts count
Configurable Metrics: Added metrics enable/disable flag to RemoteAccountProviderConfig

Reconnection and Reliability

Reconnection Logic: Replaced periodic connection recycling with event-driven reconnection using the new ReconnectableClient trait
Fibonacci Backoff: Implemented exponential backoff with fibonacci delays for reconnection attempts (max 10 minutes)
Automatic Resubscription: Clients automatically resubscribe to existing subscriptions after successful reconnection
Improved SubMuxClient: Simplified constructor and removed deprecated recycling logic

Testing and Integration

Integration Test: Added 07_subscription_limits.rs to test large-scale subscription scenarios (400 accounts)
Reconnection Tests: Comprehensive tests for reconnection logic with failed resubscription recovery
Updated Test Configurations: All tests now use metrics-enabled configurations

Code Quality

Trait Refactoring: Introduced ReconnectableClient trait for better abstraction
Logging Improvements: Enhanced debug/trace logging throughout subscription and reconnection flows
Dependency Updates: Added ureq and url for metrics fetching in integration tests

…ition

…le loop

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

magicblock-chainlink/src/remote_account_provider/mod.rs (1)
734-758: Consider reinserting evicted entry on rollback for complete atomicity.

The rollback logic at lines 742 and 753 removes the newly added pubkey from the LRU cache when eviction unsubscribe or new subscription fails. However, the evicted entry (which was successfully subscribed before eviction) is not reinserted into the cache.

This creates a state where:

LRU cache: neither evicted nor pubkey is present

Pubsub client: still subscribed to evicted (and potentially pubkey)

A past review comment suggested reinserting the evicted entry to fully restore the prior state. After removing pubkey at line 742, capacity is available, so re-adding evicted won't cause another eviction:
if let Err(err) = self.pubsub_client.unsubscribe(evicted).await {
    warn!("Failed to unsubscribe from pubsub for evicted account {evicted}: {err:?}");
    self.lrucache_subscribed_accounts.remove(pubkey);
    self.lrucache_subscribed_accounts.add(evicted);  // Restore prior state
    return Err(err);
}
Similarly at line 753 after subscribe failure. This ensures the cache matches the pubsub subscriptions even when partial operations fail.

Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 765c31f and 7490687.

📒 Files selected for processing (1)

magicblock-chainlink/src/remote_account_provider/mod.rs (28 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

📚 Learning: 2025-11-07T14:20:31.457Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

magicblock-chainlink/src/remote_account_provider/mod.rs

📚 Learning: 2025-10-21T14:00:54.642Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

magicblock-chainlink/src/remote_account_provider/mod.rs

📚 Learning: 2025-10-26T16:53:29.820Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.

Applied to files:

magicblock-chainlink/src/remote_account_provider/mod.rs

📚 Learning: 2025-11-07T13:20:13.793Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

magicblock-chainlink/src/remote_account_provider/mod.rs

🧬 Code graph analysis (1)

magicblock-chainlink/src/remote_account_provider/mod.rs (5)

magicblock-metrics/src/metrics/mod.rs (5)

inc_account_fetches_failed (399-401)

inc_account_fetches_found (403-405)

inc_account_fetches_not_found (407-409)

inc_account_fetches_success (395-397)

set_monitored_accounts_count (357-359)

magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs (3)

new (46-54)

new (311-322)

try_new_from_url (165-180)

magicblock-chainlink/src/remote_account_provider/lru_cache.rs (4)

new (25-34)

pubkeys (38-41)

pubkeys (129-135)

len (117-123)

magicblock-chainlink/src/submux/mod.rs (1)

new (147-158)

magicblock-chainlink/src/remote_account_provider/config.rs (2)

default (65-72)

try_new_with_metrics (27-42)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: run_make_ci_lint
GitHub Check: run_make_ci_test
GitHub Check: Build Project

🔇 Additional comments (10)

magicblock-chainlink/src/remote_account_provider/mod.rs (10)

1-8: LGTM: Import additions support new features.

The new imports (hash_map::Entry, task, time, and metrics functions) are necessary for the Entry API usage, background metrics updater, and metrics integration.

Also applies to: 35-36, 49-53

200-270: LGTM: Background metrics updater implementation is sound.

The periodic task correctly compares LRU cache and pubsub client subscription counts, logs discrepancies with detailed debugging info, and updates the monitored accounts gauge. The conditional debug logging is a good optimization to avoid unnecessary work in production.

283-312: LGTM: Constructor properly initializes LRU cache and optional metrics updater.

The initialization correctly handles the NonZeroUsize capacity constraint and conditionally starts the background metrics updater based on configuration. The SAFETY comment appropriately documents the unwrap.

634-641: LGTM: Entry API correctly appends to existing waiters.

The use of Entry::Occupied to append senders and Entry::Vacant to insert new entries ensures that concurrent callers waiting for the same account will all receive the fetch result, fixing the previous issue where existing waiters were dropped.

801-824: LGTM: Unsubscribe maintains consistency between LRU cache and pubsub client.

The function correctly checks LRU cache state before attempting pubsub unsubscription, and only removes from the cache after a successful pubsub unsubscribe. If pubsub unsubscribe fails, the entry remains in the LRU cache to maintain consistency with the actual subscription state.

849-866: LGTM: Error notification helper is well-structured.

The notify_error helper correctly removes all pending requests for the failed accounts, increments the failure metric, and sends error results to all waiters. This ensures no requests are left hanging when a fetch fails.

874-886: LGTM: Retry macro implementation is correct.

The retry macro properly decrements the counter, checks max retries, invokes the error notification helper on exhaustion, and uses continue to loop back for another attempt with a 400ms delay. The logic is sound.

937-937: LGTM: Helius compatibility for context slot errors.

Good catch to handle both the standard JSON_RPC_SERVER_ERROR_MIN_CONTEXT_SLOT_NOT_REACHED and the Helius-specific HELIUS_CONTEXT_SLOT_NOT_REACHED error codes. This ensures compatibility across different RPC providers.

983-1004: Verify metrics semantics for "mark empty if not found" accounts.

At lines 991-1003, accounts that are not found but are in the mark_empty_if_not_found list are counted in found_count (line 992) rather than not_found_count. This means the account_fetches_found metric includes accounts that were fabricated as empty rather than actually found on-chain.

This might be intentional to reflect that the caller received a usable account data structure, but it could be misleading when analyzing metrics to understand actual on-chain state. Consider whether these should increment a separate counter (e.g., account_fetches_marked_empty) or be clarified in documentation.

976-1015: LGTM: Metrics tracking for fetch results.

The metrics correctly track found vs. not-found accounts and increment the success counter for the entire batch. The counts are updated based on whether each account was present in the RPC response.

coderabbitai · 2025-11-10T08:55:25Z

magicblock-chainlink/src/remote_account_provider/mod.rs

+                                    // When an account is not present for the desired
+                                    // min-context slot then we normally get the below
+                                    // handled `RpcResponseError`, but may also get the
+                                    // following error from the RPC.
                                    // See test::ixtest_existing_account_for_future_slot
                                    // ```
                                    // RpcError(
                                    //   ForUser(
                                    //       "AccountNotFound: \
-                                    //        pubkey=DaeruQ4SukTQaJA5muyv51MQZok7oaCAF8fAW19mbJv5: \
+                                    // pubkey=DaeruQ4SukTQaJA5muyv51MQZok7oaCAF8fAW19mbJv5: \
                                    //        RPC response error -32016: \
                                    //        Minimum context slot has not been reached; ",
                                    //   ),
                                    // )
                                    // ```
                                    retry!("Fetching accounts failed: {rpc_user_err:?}");
-                                }
+                                 }


🧹 Nitpick | 🔵 Trivial

Consider narrowing the retry scope for ForUser errors.

The code retries on all RpcError::ForUser errors, but the comment at lines 915-919 specifically mentions the "AccountNotFound: ... Minimum context slot has not been reached" error as the expected case. Retrying on all ForUser errors might lead to unnecessary retries for unrelated errors (e.g., authentication failures, rate limits, or other user-facing errors).

Consider checking if the error message contains "Minimum context slot" or "AccountNotFound" before retrying, or at least log which type of ForUser error triggered the retry.

🤖 Prompt for AI Agents

In magicblock-chainlink/src/remote_account_provider/mod.rs around lines 915 to 931, the code currently retries on any RpcError::ForUser but should only retry for the specific "AccountNotFound" / "Minimum context slot" case described in the comment; change the retry logic to inspect the ForUser error string (or enum variant if available) and only call retry when the message contains "Minimum context slot" or "AccountNotFound", otherwise propagate or log the error (and include the exact error when logging) so unrelated user-facing errors do not trigger retries.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7490687 and 0279b24.

📒 Files selected for processing (1)

magicblock-metrics/src/metrics/mod.rs (9 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

📚 Learning: 2025-10-21T14:00:54.642Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

magicblock-metrics/src/metrics/mod.rs

🧬 Code graph analysis (1)

magicblock-metrics/src/metrics/mod.rs (2)

magicblock-chainlink/src/remote_account_provider/mod.rs (1)

new (275-330)

magicblock-aperture/src/state/subscriptions.rs (4)

new (342-355)

new (410-413)

count (335-337)

count (370-372)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: run_make_ci_test
GitHub Check: Build Project
GitHub Check: run_make_ci_lint

🔇 Additional comments (4)

magicblock-metrics/src/metrics/mod.rs (4)

281-284: LGTM!

The new account fetch metrics are properly registered following the established pattern.

395-409: LGTM!

The increment helper functions are well-designed:

Consistent naming with their corresponding metrics

Appropriate use of inc_by(count) for batch increments

Clean and straightforward implementation

357-359: All call sites verified as updated. The function rename is complete—no remaining references to the old function name, and the import and call sites in magicblock-chainlink have been correctly updated to use set_monitored_accounts_count.

241-287: Removal of SUBSCRIPTIONS_COUNT_GAUGE is safe and complete.

Verification confirms that SUBSCRIPTIONS_COUNT_GAUGE has been fully removed with no orphaned references. The replacement metric RPC_WS_SUBSCRIPTIONS_COUNT is properly defined, registered, and actively used throughout the codebase (imported and utilized in magicblock-aperture/src/state/subscriptions.rs). No breaking changes detected.

coderabbitai · 2025-11-10T09:39:25Z

magicblock-metrics/src/metrics/mod.rs

+    // Account fetch results from network (RPC)
+    pub static ref ACCOUNT_FETCHES_SUCCESS_COUNT: IntCounter =
+        IntCounter::new(
+            "account_fetches_success_count",
+            "Total number of successful network \
+             account fetches",
+        )
+        .unwrap();
+
+    pub static ref ACCOUNT_FETCHES_FAILED_COUNT: IntCounter =
+        IntCounter::new(
+            "account_fetches_failed_count",
+            "Total number of failed network account fetches \
+             (RPC errors)",
+        )
+        .unwrap();
+
+    pub static ref ACCOUNT_FETCHES_FOUND_COUNT: IntCounter =
+        IntCounter::new(
+            "account_fetches_found_count",
+            "Total number of network account fetches that \
+             found an account",
+        )
+        .unwrap();
+
+    pub static ref ACCOUNT_FETCHES_NOT_FOUND_COUNT: IntCounter =
+        IntCounter::new(
+            "account_fetches_not_found_count",
+            "Total number of network account fetches where \
+             account was not found",
+        )
+        .unwrap();


🧹 Nitpick | 🔵 Trivial

Clarify the semantic relationship between SUCCESS/FAILED and FOUND/NOT_FOUND metrics.

The four metrics track two orthogonal dimensions:

SUCCESS_COUNT / FAILED_COUNT: Whether the RPC call itself succeeded or failed

FOUND_COUNT / NOT_FOUND_COUNT: Whether the account exists on-chain

This means a successful RPC call that returns "account does not exist" would increment both SUCCESS_COUNT and NOT_FOUND_COUNT. While this design is reasonable, the semantic overlap could confuse metric consumers.

Consider adding a comment block above these metric definitions explaining:

The two orthogonal dimensions being tracked

Example scenarios (e.g., "A successful RPC call that finds no account increments both SUCCESS and FOUND")

The relationship: SUCCESS = FOUND + NOT_FOUND (approximately, barring RPC errors)

// Account fetch results from network (RPC) // These metrics track two dimensions: // 1. RPC call outcome: SUCCESS (call succeeded) vs FAILED (RPC error) // 2. Account existence: FOUND (account exists) vs NOT_FOUND (account doesn't exist) // Note: A successful RPC call increments SUCCESS and either FOUND or NOT_FOUND. // Therefore: SUCCESS ≈ FOUND + NOT_FOUND (modulo edge cases) pub static ref ACCOUNT_FETCHES_SUCCESS_COUNT: IntCounter =

🤖 Prompt for AI Agents

magicblock-metrics/src/metrics/mod.rs around lines 164 to 195: add a clarifying comment block immediately above the four ACCOUNT_FETCHES_* metric definitions that explains these metrics measure two orthogonal dimensions (RPC outcome: SUCCESS vs FAILED, and account existence: FOUND vs NOT_FOUND), give one or two short examples (e.g., a successful RPC that finds no account increments SUCCESS and NOT_FOUND), and state the relationship (SUCCESS ≈ FOUND + NOT_FOUND modulo RPC errors); place this concise explanatory text in plain English so metric consumers understand the overlap and how to interpret the counters.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77c33e1 and 2cea109.

📒 Files selected for processing (1)

magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: run_make_ci_test
GitHub Check: run_make_ci_lint
GitHub Check: Build Project

🔇 Additional comments (1)

magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)

69-69: Verify the intent and scope of blacklisting the SPL Token program.

The implementation correctly adds the SPL Token program to the blacklist. However, please confirm:

This blacklists only the Token program account itself (TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA), not individual token accounts created by the program.

The Token-2022/Token Extensions program (TokenzQdBNbLqP5VEhdkAS6EPFLC1PHnBqCXEpPxuEb) remains unblacklisted—is this intentional?

What subscription management issue prompted this addition?

coderabbitai · 2025-11-10T16:43:18Z

magicblock-chainlink/src/chainlink/blacklisted_accounts.rs

+    const NATIVE_TOKEN_PROGRAM_ID: Pubkey =
+        solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");


🧹 Nitpick | 🔵 Trivial

Consider adding a comment explaining the blacklisting rationale.

Similar to NATIVE_SOL_ID (lines 10-11), consider adding a comment explaining why the SPL Token program account is being blacklisted. This helps future maintainers understand the intent.

Example:

+ // The SPL Token program account should not be monitored directly + // (individual token accounts are fine, but the program account itself is blacklisted) const NATIVE_TOKEN_PROGRAM_ID: Pubkey = solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const NATIVE_TOKEN_PROGRAM_ID: Pubkey =

solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");

// The SPL Token program account should not be monitored directly

// (individual token accounts are fine, but the program account itself is blacklisted)

const NATIVE_TOKEN_PROGRAM_ID: Pubkey =

solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");

🤖 Prompt for AI Agents

In magicblock-chainlink/src/chainlink/blacklisted_accounts.rs around lines 51 to 52, add a concise comment above NATIVE_TOKEN_PROGRAM_ID explaining why the SPL Token program account is blacklisted (for example: to prevent token-program-driven account creation/interaction or to avoid treating the token program as a user-owned account in checks), mirroring the style and level of detail used for NATIVE_SOL_ID on lines 10–11; keep the comment short, action-oriented, and focused on intent for future maintainers.

* master: fix: fetching CUs for metrics (#623)

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2cea109 and 917f177.

📒 Files selected for processing (2)

magicblock-api/src/tickers.rs (1 hunks)
magicblock-metrics/src/service.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

📚 Learning: 2025-11-07T14:20:31.457Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

magicblock-metrics/src/service.rs

🧬 Code graph analysis (1)

magicblock-api/src/tickers.rs (2)

magicblock-api/src/magic_validator.rs (1)

ledger (735-737)

magicblock-metrics/src/metrics/mod.rs (1)

set_ledger_size (297-299)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build Project
GitHub Check: run_make_ci_test
GitHub Check: run_make_ci_lint

🔇 Additional comments (1)

magicblock-api/src/tickers.rs (1)

111-130: LGTM! Clean implementation with proper cancellation handling.

The simplified implementation correctly:

Handles ledger storage size retrieval errors with appropriate logging

Uses tokio::select! for graceful cancellation

Clones the Arc<Ledger> before moving into the async block

coderabbitai · 2025-11-13T12:21:29Z

magicblock-api/src/tickers.rs

 #[allow(unused_variables)]
 pub fn init_system_metrics_ticker(
    tick_duration: Duration,
    ledger: &Arc<Ledger>,
    accountsdb: &Arc<AccountsDb>,
    token: CancellationToken,
 ) -> tokio::task::JoinHandle<()> {


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the unused accountsdb parameter is intentional.

The accountsdb parameter is no longer used in the simplified implementation, yet it remains in the function signature. While the #[allow(unused_variables)] attribute acknowledges this, keeping unused parameters can mislead callers about the function's behavior.

Please confirm:

Is this parameter kept for API compatibility during a transition period?

Should it be removed in a future breaking change?

Should the function documentation explicitly note that accountsdb is currently unused but kept for compatibility?

To verify the impact of removing this parameter, run:

🏁 Script executed:

#!/bin/bash # Search for all call sites of init_system_metrics_ticker rg -n -A5 -B2 'init_system_metrics_ticker\s*\(' --type rust

Length of output: 1143

--- 🏁 Script executed: ```shell #!/bin/bash # Get the complete function implementation to see if accountsdb is used sed -n '104,130p' magicblock-api/src/tickers.rs

Length of output: 838

🏁 Script executed:

#!/bin/bash # Search comprehensively for all references to init_system_metrics_ticker (function calls only, not definition) rg -n 'init_system_metrics_ticker' --type rust | grep -v '^\s*pub fn init_system_metrics_ticker'

Length of output: 353

🏁 Script executed:

#!/bin/bash # Check the call site in magic_validator.rs to see the full context sed -n '220,235p' magicblock-api/src/magic_validator.rs

Length of output: 466

Remove the unused accountsdb parameter from function signature and call site.

The accountsdb parameter is declared but never used in the function implementation. The #[allow(unused_variables)] attribute is suppressing a compiler warning that indicates incomplete refactoring.

Required fixes:

Remove accountsdb: &Arc<AccountsDb>, from function signature at magicblock-api/src/tickers.rs:108

Remove &accountsdb, from the function call at magicblock-api/src/magic_validator.rs:227

🤖 Prompt for AI Agents

In magicblock-api/src/tickers.rs around lines 104–110, the function init_system_metrics_ticker declares an unused parameter accountsdb and silences the warning with #[allow(unused_variables)]; remove the accountsdb: &Arc<AccountsDb>, parameter from the function signature and delete the #[allow(unused_variables)] attribute, then update all call sites (specifically remove the &accountsdb, argument at magicblock-api/src/magic_validator.rs:227) so the function is invoked with the remaining parameters; also remove any now-unused AccountsDb imports or bindings to keep the code clean.

coderabbitai · 2025-11-13T12:21:29Z

magicblock-api/src/tickers.rs

+/*
+#[allow(unused_variables)]
+pub fn init_system_metrics_ticker_old(
+    tick_duration: Duration,
+    ledger: &Arc<Ledger>,
+    accountsdb: &Arc<AccountsDb>,
+    token: CancellationToken,
+) -> tokio::task::JoinHandle<()> {
+    fn try_set_ledger_counts(ledger: &Ledger) {
+        macro_rules! try_set_ledger_count {
+            ($name:ident) => {
+                paste::paste! {
+                    match ledger.[< count_ $name >]() {
+                        Ok(count) => {
+                            metrics::[< set_ledger_ $name _count >](count);
+                        }
+                        Err(err) => warn!(
+                            "Failed to get ledger {} count: {:?}",
+                            stringify!($name),
+                            err
+                        ),
+                    }
+                }
+            };
+        }
+        try_set_ledger_count!(block_times);
+        try_set_ledger_count!(blockhashes);
+        try_set_ledger_count!(slot_signatures);
+        try_set_ledger_count!(address_signatures);
+        try_set_ledger_count!(transaction_status);
+        try_set_ledger_count!(transaction_successful_status);
+        try_set_ledger_count!(transaction_failed_status);
+        try_set_ledger_count!(transactions);
+        try_set_ledger_count!(transaction_memos);
+        try_set_ledger_count!(perf_samples);
+        try_set_ledger_count!(account_mod_data);
+    }
+
+    fn try_set_ledger_storage_size(ledger: &Ledger) {
+        match ledger.storage_size() {
+            Ok(byte_size) => metrics::set_ledger_size(byte_size),
+            Err(err) => warn!("Failed to get ledger storage size: {:?}", err),
+        }
+    }
+    fn set_accounts_storage_size(bank: &Bank) {
+        let byte_size = bank.accounts_db_storage_size();
+        metrics::set_accounts_size(byte_size);
+    }
+    fn set_accounts_count(bank: &Bank) {
+        metrics::set_accounts_count(bank.accounts_db.get_accounts_count());
+    }
+
+    let ledger = ledger.clone();
+    let bank = bank.clone();
+    tokio::task::spawn(async move {
+        loop {
+            tokio::select! {
+                _ = tokio::time::sleep(tick_duration) => {
+                    try_set_ledger_storage_size(&ledger);
+                    set_accounts_storage_size(&bank);
+                    try_set_ledger_counts(&ledger);
+                    set_accounts_count(&bank);
+                },
+                _ = token.cancelled() => {
+                    break;
+                }
+            }
+        }
+    })

    tokio::task::spawn(async move {})
 }
+*/


🧹 Nitpick | 🔵 Trivial

Consider removing the commented-out old implementation.

The 72-line commented block preserves the previous multi-metric implementation. While this may serve as temporary reference during the transition, large commented sections can clutter the codebase and create maintenance overhead.

Recommendations:

If this is a temporary reference during a transition period, add a TODO comment with a timeline for removal (e.g., "TODO: Remove after v2.0 release").

If the old implementation may be needed for rollback, consider preserving it in a separate branch or commit rather than in comments.

If permanently removed, delete the commented code—Git history will preserve it if needed for future reference.

🤖 Prompt for AI Agents

In magicblock-api/src/tickers.rs around lines 132 to 204, there is a 72-line commented-out old implementation (init_system_metrics_ticker_old) cluttering the file; remove the large commented block or replace it with a short TODO note. If you need a temporary reference during the transition, replace the commented code with a single-line TODO including a removal timeline (e.g., "TODO: remove after vX.Y") or move the old implementation to a separate commit/branch, otherwise delete the commented block entirely so Git history retains the prior code.

coderabbitai · 2025-11-13T12:21:29Z

magicblock-metrics/src/service.rs

+    // We must consume the body fully to keep the connection alive. We
+    // iterate over all chunks and simply drop them. This prevents garbage
+    // data of previous requests from being stuck in connection buffer.
+    let mut body = req.into_body();
+    while (body.frame().await).is_some() {}
+
+    result


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Add timeout and error handling to body draining loop.

The body draining logic has the following concerns:

DoS vulnerability: The loop has no timeout, allowing a malicious or slow client to tie up the server indefinitely by sending a large body slowly. Consider adding a timeout (e.g., using tokio::time::timeout) or a maximum frame count limit.

Silent error handling: Errors from body.frame().await are silently ignored. At minimum, log errors for debugging purposes.

Performance: The body is drained for every request, including GET requests that typically have no body. While the overhead is minimal for empty bodies, consider checking req.headers().get("content-length") first to skip draining when there's no body.

Apply this diff to add timeout and error logging:

- // We must consume the body fully to keep the connection alive. We - // iterate over all chunks and simply drop them. This prevents garbage - // data of previous requests from being stuck in connection buffer. - let mut body = req.into_body(); - while (body.frame().await).is_some() {} + // We must consume the body fully to keep the connection alive. We + // iterate over all chunks and simply drop them. This prevents garbage + // data of previous requests from being stuck in connection buffer. + let mut body = req.into_body(); + let drain_timeout = tokio::time::Duration::from_secs(30); + match tokio::time::timeout(drain_timeout, async { + while let Some(frame_result) = body.frame().await { + if let Err(e) = frame_result { + warn!("Error draining request body: {:?}", e); + } + } + }) + .await + { + Err(_) => warn!("Timeout draining request body"), + Ok(_) => {} + }

Additionally, verify whether this body draining is necessary with the current version of Hyper. Modern versions typically handle request body consumption automatically for keep-alive connections:

🌐 Web query:

Does Hyper automatically consume request bodies for HTTP/1.1 keep-alive connections?

💡 Result:

Short answer: No. Hyper does not implicitly drain/consume an unread request body for you — if you want the TCP connection to be reused for HTTP/1.1 keep‑alive you must read (or explicitly discard) the request body before returning a response, or configure your server/connection behavior accordingly. See hyper's docs on request bodies and collecting them, and the HTTP/1.1 messaging guidance that servers should read or discard a body if they intend to keep a connection open. [1][2][3]

Sources:

hyper guide: request/response bodies and collecting the request body. [1]

hyper server/conn::Http options (http1_keep_alive behavior). [2]

HTTP messaging spec: servers SHOULD read/discard request bodies if they will keep the transport open (HTTP/1.1 guidance). [3]

Add timeout and error handling to body draining loop to prevent DoS and improve observability.

The body draining logic correctly implements HTTP/1.1 keep-alive support, as Hyper requires explicit body consumption before reusing the connection. However, the implementation needs hardening:

DoS vulnerability: The loop has no timeout, allowing a malicious or slow client to tie up the server indefinitely by sending a large body slowly. Add a timeout (e.g., tokio::time::timeout) or a maximum frame count limit.

Silent error handling: Errors from body.frame().await are silently ignored, making debugging difficult. At minimum, log errors.

Performance optimization (optional): Skip body draining for requests without a body by checking req.headers().get("content-length") first.

Apply this diff to add timeout and error logging:

- // We must consume the body fully to keep the connection alive. We - // iterate over all chunks and simply drop them. This prevents garbage - // data of previous requests from being stuck in connection buffer. - let mut body = req.into_body(); - while (body.frame().await).is_some() {} + // We must consume the body fully to keep the connection alive. We + // iterate over all chunks and simply drop them. This prevents garbage + // data of previous requests from being stuck in connection buffer. + let mut body = req.into_body(); + let drain_timeout = tokio::time::Duration::from_secs(30); + match tokio::time::timeout(drain_timeout, async { + while let Some(frame_result) = body.frame().await { + if let Err(e) = frame_result { + warn!("Error draining request body: {:?}", e); + } + } + }) + .await + { + Err(_) => warn!("Timeout draining request body"), + Ok(_) => {} + }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In magicblock-metrics/src/service.rs around lines 131 to 137, the body-draining loop needs hardening: wrap the await in a tokio::time::timeout (e.g., a few seconds) to avoid slow-client DoS, track a maximum frame count (e.g., a reasonable upper bound) to prevent infinite loops, and check whether the request actually has a body first (inspect Content-Length or absence of transfer-encoding) to skip draining when unnecessary. On timeout or any Err from body.frame().await, log the error with context and break the loop; on Ok(None) stop normally. Ensure the timeout and frame count are configurable constants and that logging includes request id/context for observability.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

test-integration/test-schedule-intent/tests/test_schedule_intents.rs (1)

128-141: Add verification after second undelegation at line 140.

The test verifies undelegation after the first schedule_intent (line 128), but the second schedule_intent at line 132 with Some(vec![102]) also triggers undelegation since any Some(...) value in counter_diffs sets is_undelegate = true. For consistency and completeness, add verify_undelegation_in_ephem_via_owner(&[payer.pubkey()], &ctx); after the assertion block ends at line 140.
magicblock-chainlink/src/remote_account_provider/mod.rs (2)
911-970: Retry scope for ForUser errors could be narrower.

The code retries on all RpcError::ForUser errors (line 930), but the comment (lines 915-919) suggests only the "AccountNotFound: Minimum context slot has not been reached" case requires retries. Other ForUser errors (authentication failures, rate limits, etc.) won't resolve with retries and waste retry budget.

This is acceptable for a background task with limited retries, but consider narrowing the retry condition:
 RpcError::ForUser(ref rpc_user_err) => {
-    // When an account is not present for the desired
-    // min-context slot then we normally get the below
-    // handled `RpcResponseError`, but may also get the
-    // following error from the RPC.
-    // See test::ixtest_existing_account_for_future_slot
-    // ```
-    // RpcError(
-    //   ForUser(
-    //       "AccountNotFound: \
-    // pubkey=DaeruQ4SukTQaJA5muyv51MQZok7oaCAF8fAW19mbJv5: \
-    //        RPC response error -32016: \
-    //        Minimum context slot has not been reached; ",
-    //   ),
-    // )
-    // ```
-    retry!("Fetching accounts failed: {rpc_user_err:?}");
+    // Retry only for context slot issues
+    if rpc_user_err.contains("Minimum context slot") 
+        || rpc_user_err.contains("AccountNotFound") {
+        retry!("Fetching accounts failed: {rpc_user_err:?}");
+    } else {
+        let err_msg = format!(
+            "RpcError fetching accounts {}: ForUser({rpc_user_err})", 
+            pubkeys_str(&pubkeys)
+        );
+        notify_error(&err_msg);
+        return;
+    }
 }
728-758: Subscription leak on eviction failure—reinsertion of evicted account required for rollback consistency.

When pubsub_client.unsubscribe(evicted) fails at line 738, the code correctly removes pubkey from the LRU cache to prevent subscription of the new account. However, evicted was already removed from the LRU by add(*pubkey) on line 734 and is not reinserted during rollback.

This creates a mismatch: evicted remains subscribed in the pubsub client but is no longer tracked in the LRU cache, causing a subscription leak. The diagnostic code at lines 223–254 will detect and warn about this inconsistency.

Restore the pre-add state by reinserting evicted:
 if let Err(err) = self.pubsub_client.unsubscribe(evicted).await {
     warn!(
         "Failed to unsubscribe from pubsub for evicted account {evicted}: {err:?}");
     // Rollback the LRU add since eviction failed
     self.lrucache_subscribed_accounts.remove(pubkey);
+    // Restore evicted to maintain consistency between LRU and pubsub
+    self.lrucache_subscribed_accounts.add(evicted);
     return Err(err);
 }
Since pubkey was just removed, capacity is available and reinserting evicted will not trigger another eviction.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 917f177 and 0b1b245.

📒 Files selected for processing (3)

magicblock-chainlink/src/chainlink/mod.rs (5 hunks)
magicblock-chainlink/src/remote_account_provider/mod.rs (28 hunks)
test-integration/test-schedule-intent/tests/test_schedule_intents.rs (5 hunks)

🧰 Additional context used

🧠 Learnings (7)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

📚 Learning: 2025-11-07T14:20:31.457Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs
magicblock-chainlink/src/remote_account_provider/mod.rs

📚 Learning: 2025-10-21T14:00:54.642Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs
magicblock-chainlink/src/remote_account_provider/mod.rs

📚 Learning: 2025-11-07T13:20:13.793Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs
magicblock-chainlink/src/remote_account_provider/mod.rs

📚 Learning: 2025-10-14T09:56:14.047Z

Learnt from: taco-paco
Repo: magicblock-labs/magicblock-validator PR: 564
File: test-integration/programs/flexi-counter/src/processor/call_handler.rs:122-125
Timestamp: 2025-10-14T09:56:14.047Z
Learning: The file test-integration/programs/flexi-counter/src/processor/call_handler.rs contains a test smart contract used for integration testing, not production code.

Applied to files:

test-integration/test-schedule-intent/tests/test_schedule_intents.rs

📚 Learning: 2025-11-07T13:09:52.253Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: test-kit/src/lib.rs:275-0
Timestamp: 2025-11-07T13:09:52.253Z
Learning: In test-kit, the transaction scheduler in ExecutionTestEnv is not expected to shut down during tests. Therefore, using `.unwrap()` in test helper methods like `schedule_transaction` is acceptable and will not cause issues in the test environment.

Applied to files:

test-integration/test-schedule-intent/tests/test_schedule_intents.rs

📚 Learning: 2025-10-26T16:53:29.820Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.

Applied to files:

magicblock-chainlink/src/remote_account_provider/mod.rs

🧬 Code graph analysis (3)

magicblock-chainlink/src/chainlink/mod.rs (2)

magicblock-chainlink/src/chainlink/fetch_cloner.rs (1)

new (134-158)

magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)

blacklisted_accounts (6-30)

test-integration/test-schedule-intent/tests/test_schedule_intents.rs (1)

test-integration/programs/flexi-counter/src/state.rs (1)

pda (32-35)

magicblock-chainlink/src/remote_account_provider/mod.rs (6)

magicblock-metrics/src/metrics/mod.rs (5)

inc_account_fetches_failed (399-401)

inc_account_fetches_found (403-405)

inc_account_fetches_not_found (407-409)

inc_account_fetches_success (395-397)

set_monitored_accounts_count (357-359)

magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs (1)

new (126-158)

magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs (3)

new (46-54)

new (311-322)

try_new_from_url (165-180)

magicblock-chainlink/src/submux/mod.rs (1)

new (147-158)

magicblock-chainlink/src/remote_account_provider/lru_cache.rs (4)

new (25-34)

pubkeys (38-41)

pubkeys (129-135)

len (117-123)

magicblock-chainlink/src/remote_account_provider/config.rs (2)

default (65-72)

try_new_with_metrics (27-42)

🔇 Additional comments (16)

test-integration/test-schedule-intent/tests/test_schedule_intents.rs (3)

66-69: LGTM! Verification helper appropriately placed.

The undelegation verification is correctly positioned after the schedule_intent operation. Minor observation: line 247 includes a debug log after verification, but this call site doesn't—consider adding one for consistency with other tests.

193-196: LGTM! Multi-account verification correctly implemented.

The verification helper is properly invoked with multiple pubkeys, correctly mapping from the payers vector. This ensures all undelegated accounts are verified in the ephemeral layer.

243-247: LGTM! Good example of verification with logging.

This call site demonstrates the complete pattern with a debug log after verification. This pattern could be applied to other call sites (lines 69, 128) for consistency.

magicblock-chainlink/src/chainlink/mod.rs (4)

1-4: LGTM!

The atomic imports are appropriate for the new counter-based account removal tracking in reset_accounts_bank.

139-191: LGTM!

The refactored account removal logic with atomic counters provides clear telemetry. The corrected non_empty calculation using saturating_sub (from previous fixes) properly computes non-empty accounts without the fetch_sub pitfall.

317-338: LGTM!

The enhanced logging correctly gates on Trace level and uses the trace!() macro consistently. The mark_empty_if_not_found details provide valuable debugging context for account fetching.

361-377: LGTM!

Upgrading undelegation logging from trace!() to debug!() is appropriate for these important operational events, making them visible without requiring trace-level verbosity.

magicblock-chainlink/src/remote_account_provider/mod.rs (9)

1-74: LGTM!

The new imports support the enhanced functionality: Entry API for atomic fetch operations, metrics integration, and background task management. The FetchResult type alias improves code clarity, and the Helius-specific error code constant documents a provider-specific variation.

75-109: LGTM!

The field rename to lrucache_subscribed_accounts is more descriptive, and the _active_subscriptions_task_handle properly maintains the background task lifetime through Drop semantics.

272-330: LGTM!

The constructor properly initializes the LRU cache and conditionally starts the metrics updater based on configuration. The NonZeroUsize construction is safe due to the config validation invariant documented in the comment.

332-377: LGTM!

The per-endpoint abort channels enable independent lifecycle management for each pubsub client, supporting the event-driven reconnection strategy mentioned in the PR objectives.

610-706: LGTM!

The Entry API usage (lines 634-643) correctly appends to existing waiter lists instead of replacing them, fixing the concurrent fetch issue from previous reviews. The nested Result handling for FetchResult is appropriate for the channel-based architecture.

708-725: LGTM!

The subscription setup cleanly delegates to the subscribe method for each pubkey, which properly handles deduplication via is_watching checks.

797-825: LGTM!

The unsubscribe logic correctly maintains consistency between the LRU cache and pubsub client by only removing from the LRU after successful pubsub unsubscription (lines 809-813). The early return for non-tracked pubkeys (lines 801-807) prevents spurious errors.

849-886: LGTM!

The notify_error helper and retry! macro cleanly encapsulate error handling and retry logic, properly integrating metrics and notifying all waiting requests on failure.

976-1055: LGTM!

The metrics integration properly tracks found/not-found counts and updates gauges after successful fetches. The handling of mark_empty_if_not_found accounts (lines 991-1004) correctly distinguishes between truly missing accounts and those explicitly marked as empty.

coderabbitai · 2025-11-13T13:54:47Z

magicblock-chainlink/src/remote_account_provider/mod.rs

+    /// Creates a background task that periodically updates the active subscriptions gauge
+    fn start_active_subscriptions_updater<PubsubClient: ChainPubsubClient>(
+        subscribed_accounts: Arc<AccountsLruCache>,
+        pubsub_client: Arc<PubsubClient>,
+    ) -> task::JoinHandle<()> {
+        task::spawn(async move {
+            let mut interval = time::interval(Duration::from_millis(
+                ACTIVE_SUBSCRIPTIONS_UPDATE_INTERVAL_MS,
+            ));
+            let never_evicted = subscribed_accounts.never_evicted_accounts();
+
+            loop {
+                interval.tick().await;
+                let lru_count = subscribed_accounts.len();
+                let (pubsub_total, pubsub_without_never_evict) = pubsub_client
+                    .subscription_count(Some(&never_evicted))
+                    .await;
+
+                let all_pubsub_subs = if log::log_enabled!(log::Level::Debug) {
+                    pubsub_client.subscriptions()
+                } else {
+                    vec![]
+                };
+                if lru_count != pubsub_without_never_evict {
+                    warn!(
+                        "User account subscription counts LRU cache={} pubsub client={} don't match",
+                        lru_count, pubsub_without_never_evict
+                    );
+                    if log::log_enabled!(log::Level::Debug) {
+                        // Log all pubsub subscriptions for debugging
+                        trace!(
+                            "All pubsub subscriptions: {:?}",
+                            all_pubsub_subs
+                        );
+
+                        // Find extra keys in pubsub that are not in LRU cache
+                        let lru_pubkeys = subscribed_accounts.pubkeys();
+                        let pubsub_subs_without_never_evict: HashSet<_> =
+                            all_pubsub_subs
+                                .iter()
+                                .filter(|pk| !never_evicted.contains(pk))
+                                .copied()
+                                .collect();
+                        let lru_pubkeys_set: HashSet<_> =
+                            lru_pubkeys.into_iter().collect();
+
+                        let extra_in_pubsub: Vec<_> =
+                            pubsub_subs_without_never_evict
+                                .difference(&lru_pubkeys_set)
+                                .cloned()
+                                .collect();
+                        let extra_in_lru: Vec<_> = lru_pubkeys_set
+                            .difference(&pubsub_subs_without_never_evict)
+                            .cloned()
+                            .collect();
+
+                        if !extra_in_pubsub.is_empty() {
+                            debug!("Extra pubkeys in pubsub client not in LRU cache: {:?}", extra_in_pubsub);
+                        }
+                        if !extra_in_lru.is_empty() {
+                            debug!("Extra pubkeys in LRU cache not in pubsub client: {:?}", extra_in_lru);
+                        }
+                    }
+                }
+
+                debug!("Updating active subscriptions: count={}", pubsub_total);
+                trace!("All subscriptions: {}", pubkeys_str(&all_pubsub_subs));
+                set_monitored_accounts_count(pubsub_total);
+            }
+        })
+    }


🧹 Nitpick | 🔵 Trivial

LGTM with optional improvement suggestion.

The background updater provides valuable subscription-count reconciliation. The diagnostic logging (lines 228-262) is properly gated behind debug-level checks to avoid overhead in production.

Optional: Consider graceful shutdown for the metrics task.

The background task runs indefinitely and will be aborted when the provider is dropped. For a metrics task this is acceptable, but you could add a CancellationToken parameter to allow clean shutdown if needed in the future.

fn start_active_subscriptions_updater<PubsubClient: ChainPubsubClient>( subscribed_accounts: Arc<AccountsLruCache>, pubsub_client: Arc<PubsubClient>, + shutdown: tokio_util::sync::CancellationToken, ) -> task::JoinHandle<()> { task::spawn(async move { let mut interval = time::interval(Duration::from_millis( ACTIVE_SUBSCRIPTIONS_UPDATE_INTERVAL_MS, )); let never_evicted = subscribed_accounts.never_evicted_accounts(); loop { - interval.tick().await; + tokio::select! { + _ = interval.tick() => {}, + _ = shutdown.cancelled() => { + debug!("Active subscriptions updater shutting down"); + break; + } + }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

magicblock-chainlink/src/remote_account_provider/mod.rs around lines 200 to 270: the background updater currently runs forever and is only aborted when the provider is dropped; add an explicit CancellationToken parameter to start_active_subscriptions_updater and use it inside the spawned task to stop the loop cleanly (e.g., use tokio::select! to await either interval.tick().await or token.cancelled().await and break/return on cancellation), propagate the token from the caller when spawning the task, and ensure the task exits promptly so the JoinHandle completes gracefully.

coderabbitai · 2025-11-13T13:54:47Z

test-integration/test-schedule-intent/tests/test_schedule_intents.rs

+fn verify_undelegation_in_ephem_via_owner(
+    pubkeys: &[Pubkey],
+    ctx: &IntegrationTestContext,
+) {
+    const RETRY_LIMIT: usize = 20;
+    let mut retries = 0;
+
+    loop {
+        ctx.wait_for_next_slot_ephem().unwrap();
+        let mut not_verified = vec![];
+        for pk in pubkeys.iter() {
+            let counter_pda = FlexiCounter::pda(pk).0;
+            let owner = ctx.fetch_ephem_account_owner(counter_pda).unwrap();
+            if owner == delegation_program_id() {
+                not_verified.push(*pk);
+            }
+        }
+        if not_verified.is_empty() {
+            break;
+        }
+        retries += 1;
+        if retries >= RETRY_LIMIT {
+            panic!(
+                "Failed to verify undelegation for pubkeys: {}",
+                not_verified
+                    .iter()
+                    .map(|k| k.to_string())
+                    .collect::<Vec<_>>()
+                    .join(", ")
+            );
+        }
+    }
+}


🧹 Nitpick | 🔵 Trivial

Consider adding progress logging to aid debugging.

The verification logic is correct: it polls until all accounts' owners are no longer delegation_program_id() or the retry limit is exceeded. The retry limit of 20 and the polling approach are reasonable for test scenarios.

Consider adding debug logging to track verification progress, which would help diagnose issues if the retry limit is reached:

fn verify_undelegation_in_ephem_via_owner( pubkeys: &[Pubkey], ctx: &IntegrationTestContext, ) { const RETRY_LIMIT: usize = 20; let mut retries = 0; loop { ctx.wait_for_next_slot_ephem().unwrap(); let mut not_verified = vec![]; for pk in pubkeys.iter() { let counter_pda = FlexiCounter::pda(pk).0; let owner = ctx.fetch_ephem_account_owner(counter_pda).unwrap(); if owner == delegation_program_id() { not_verified.push(*pk); } } if not_verified.is_empty() { + debug!("✅ Verified undelegation for {} accounts", pubkeys.len()); break; } retries += 1; + debug!( + "⏳ Undelegation verification retry {}/{}: {} accounts remaining", + retries, + RETRY_LIMIT, + not_verified.len() + ); if retries >= RETRY_LIMIT { panic!( "Failed to verify undelegation for pubkeys: {}", not_verified .iter() .map(|k| k.to_string()) .collect::<Vec<_>>() .join(", ") ); } } }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In test-integration/test-schedule-intent/tests/test_schedule_intents.rs around lines 460 to 492, add progress logging inside the polling loop to emit the current retry count and the list of pubkeys still not verified each iteration and a success log when verification completes; specifically, log (e.g., via tracing::info! or println!) the retries and not_verified pubkeys before incrementing retries so failures are easier to diagnose, and optionally log a final confirmation once not_verified.is_empty() before breaking.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b1b245 and 7880a7d.

📒 Files selected for processing (3)

magicblock-chainlink/src/chainlink/fetch_cloner.rs (5 hunks)
magicblock-chainlink/src/chainlink/mod.rs (5 hunks)
magicblock-metrics/src/metrics/mod.rs (9 hunks)

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

📚 Learning: 2025-10-14T09:56:14.047Z

Learnt from: taco-paco
Repo: magicblock-labs/magicblock-validator PR: 564
File: test-integration/programs/flexi-counter/src/processor/call_handler.rs:122-125
Timestamp: 2025-10-14T09:56:14.047Z
Learning: The file test-integration/programs/flexi-counter/src/processor/call_handler.rs contains a test smart contract used for integration testing, not production code.

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs

📚 Learning: 2025-11-07T13:20:13.793Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs

📚 Learning: 2025-10-21T14:00:54.642Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs
magicblock-chainlink/src/chainlink/fetch_cloner.rs
magicblock-metrics/src/metrics/mod.rs

📚 Learning: 2025-11-07T14:20:31.457Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

Applied to files:

magicblock-chainlink/src/chainlink/mod.rs
magicblock-chainlink/src/chainlink/fetch_cloner.rs

📚 Learning: 2025-10-26T16:53:29.820Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: magicblock-chainlink/src/remote_account_provider/mod.rs:134-0
Timestamp: 2025-10-26T16:53:29.820Z
Learning: In magicblock-chainlink/src/remote_account_provider/mod.rs, the `Endpoint::separate_pubsub_url_and_api_key()` method uses `split_once("?api-key=")` because the api-key parameter is always the only query parameter right after `?`. No additional query parameter parsing is needed for this use case.

Applied to files:

magicblock-chainlink/src/chainlink/fetch_cloner.rs

🧬 Code graph analysis (2)

magicblock-chainlink/src/chainlink/mod.rs (2)

magicblock-chainlink/src/chainlink/blacklisted_accounts.rs (1)

blacklisted_accounts (6-30)

magicblock-metrics/src/metrics/mod.rs (1)

inc_undelegation_requested (427-429)

magicblock-chainlink/src/chainlink/fetch_cloner.rs (2)

magicblock-metrics/src/metrics/mod.rs (1)

inc_undelegation_completed (431-433)

magicblock-chainlink/src/remote_account_provider/config.rs (1)

try_new_with_metrics (27-42)

coderabbitai · 2025-11-13T14:08:04Z

magicblock-chainlink/src/chainlink/fetch_cloner.rs

+                    if let Some(in_bank) =
+                        self.accounts_bank.get_account(&pubkey)
+                    {
+                        if !in_bank.delegated()
+                            && in_bank.owner().eq(&dlp::id())
+                            && !account.owner().eq(&dlp::id())
+                        {
+                            debug!(
+                                "Undelegation completed for account: {pubkey}"
+                            );
+                            magicblock_metrics::metrics::inc_undelegation_completed();
+                        }


⚠️ Potential issue | 🟠 Major

Fix undelegation completion detection logic.

During cloning we explicitly set delegated accounts to delegated = true and overwrite their owner with the delegation record’s owner (see the set_owner(...).set_delegated(is_delegated_to_us) path in this file). That means a still-delegated account lives in the bank with delegated() == true and owner() != dlp::id(). The new guard checks the exact opposite (!in_bank.delegated() and in_bank.owner() == dlp::id()), so it never trips for the happy path. As a result inc_undelegation_completed() will never run, making the new metric useless. Flip the predicate to look for an entry we previously considered delegated and an update whose owner is no longer dlp::id(), e.g.:

- if !in_bank.delegated() - && in_bank.owner().eq(&dlp::id()) - && !account.owner().eq(&dlp::id()) - { + if in_bank.delegated() + && !account.delegated() + && !account.owner().eq(&dlp::id()) + {

This fires exactly once on the first post-undelegation update and keeps the metric aligned with the new observability story.

🤖 Prompt for AI Agents

In magicblock-chainlink/src/chainlink/fetch_cloner.rs around lines 224 to 235, the undelegation completion check is inverted and never matches the actual cloned state; change the guard to detect an entry that was previously marked delegated in the bank (in_bank.delegated() == true and in_bank.owner() == dlp::id()) and which is now updated so its owner is no longer dlp::id() (i.e. !account.owner().eq(&dlp::id())), then run the debug log and magicblock_metrics::metrics::inc_undelegation_completed() in that branch so the metric fires once on the first post-undelegation update.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7880a7d and 9f8a0b2.

📒 Files selected for processing (1)

magicblock-ledger/src/ledger_truncator.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

📚 Learning: 2025-10-28T13:15:42.706Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 596
File: magicblock-processor/src/scheduler.rs:1-1
Timestamp: 2025-10-28T13:15:42.706Z
Learning: In magicblock-processor, transaction indexes were always set to 0 even before the changes in PR #596. The proper transaction indexing within slots will be addressed during the planned ledger rewrite.

Applied to files:

magicblock-ledger/src/ledger_truncator.rs

📚 Learning: 2025-11-07T13:20:13.793Z

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 589
File: magicblock-processor/src/scheduler/coordinator.rs:227-238
Timestamp: 2025-11-07T13:20:13.793Z
Learning: In magicblock-processor's ExecutionCoordinator (scheduler/coordinator.rs), the `account_contention` HashMap intentionally does not call `shrink_to_fit()`. Maintaining slack capacity is beneficial for performance by avoiding frequent reallocations during high transaction throughput. As long as empty entries are removed from the map (which `clear_account_contention` does), the capacity overhead is acceptable.

Applied to files:

magicblock-ledger/src/ledger_truncator.rs

🔇 Additional comments (1)

magicblock-ledger/src/ledger_truncator.rs (1)

206-207: Good use of spawn_blocking to offload heavy deletion work.

Moving RocksDB deletions to a blocking task prevents blocking the Tokio async runtime, which improves responsiveness during large truncation operations.

coderabbitai · 2025-11-13T17:15:29Z

magicblock-ledger/src/ledger_truncator.rs

+        if let Err(err) = delete_handle.await {
+            error!("Ledger delete task cancelled: {err}");
        }

        Self::compact_slot_range(ledger, from_slot, to_slot).await;


⚠️ Potential issue | 🟠 Major

Consider skipping compaction if the deletion task fails.

If the blocking deletion task is cancelled or panics (line 233), the function logs the error but still proceeds to compact (line 237). This means compaction runs even when slot deletions may have failed or only partially completed, which could be wasteful or lead to unexpected ledger state.

Consider returning early if delete_handle.await fails, or at minimum document why compaction should proceed despite deletion failures.

Apply this diff to skip compaction when deletion fails:

- if let Err(err) = delete_handle.await { - error!("Ledger delete task cancelled: {err}"); + match delete_handle.await { + Ok(_) => { + Self::compact_slot_range(ledger, from_slot, to_slot).await; + } + Err(err) => { + error!("Ledger delete task cancelled, skipping compaction: {err}"); + return; + } } - - Self::compact_slot_range(ledger, from_slot, to_slot).await;

🤖 Prompt for AI Agents

In magicblock-ledger/src/ledger_truncator.rs around lines 233 to 237, the code logs an error when delete_handle.await fails but still proceeds to call Self::compact_slot_range(...); modify the flow to skip compaction when the deletion task failed by returning early (or otherwise short-circuiting) after logging the error so compaction only runs on successful deletion; ensure the function returns or exits appropriately in the error branch and keep the existing log message (or augment it) to preserve diagnostics.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 021ef0d and a8900e1.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (1)

Cargo.toml (3 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: bmuddha
Repo: magicblock-labs/magicblock-validator PR: 578
File: magicblock-aperture/src/requests/websocket/account_subscribe.rs:18-27
Timestamp: 2025-10-21T14:00:54.642Z
Learning: In magicblock-aperture account_subscribe handler (src/requests/websocket/account_subscribe.rs), the RpcAccountInfoConfig fields data_slice, commitment, and min_context_slot are currently ignored—only encoding is applied. This is tracked as technical debt in issue #579: https://github.com/magicblock-labs/magicblock-validator/issues/579

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 621
File: magicblock-chainlink/src/remote_account_provider/chain_pubsub_actor.rs:457-495
Timestamp: 2025-11-07T14:20:31.457Z
Learning: In magicblock-chainlink/src/remote_account_provider/chain_pubsub_client.rs, the unsubscribe closure returned by PubSubConnection::account_subscribe(...) resolves to () (unit), not a Result. Downstream code should not attempt to inspect an unsubscribe result and can optionally wrap it in a timeout to guard against hangs.

📚 Learning: 2025-10-26T16:54:39.084Z

Learnt from: thlorenz
Repo: magicblock-labs/magicblock-validator PR: 587
File: test-manual/Cargo.toml:0-0
Timestamp: 2025-10-26T16:54:39.084Z
Learning: In the magicblock-validator repository, use git branch references (not commit hashes or tags) for the helius-laserstream dependency to allow automatic updates when the branch is pushed to.

Applied to files:

Cargo.toml

🔇 Additional comments (2)

Cargo.toml (2)

94-94: Confirm the log feature removal is intentional.

Removing features = ["release_max_level_info"] means debug and trace logs will now compile into release builds. This increases binary size and runtime logging overhead in production. Verify this aligns with the PR's metrics/logging objectives and the trade-off is acceptable.

94-232: Verification complete—no issues found.

The ureq = "2.9.6" dependency is correctly added in test-integration/Cargo.toml, with test-integration/test-tools/Cargo.toml referencing it as a workspace dependency. The url = "2.5.0" dependency is present in the root workspace Cargo.toml (line 218 in the provided snippet) and also declared in integration test modules. Both dependencies are properly positioned for integration test metrics fetching as described in the PR objectives.

Cargo.toml

* master: fix: move delete onto separate thread (#629) feat: return ledger + accountsdb metrics (#624)

This reverts commit a8900e1.

thlorenz added 30 commits October 31, 2025 10:44

chore: subscribe sequentially and remove possible LRU cache race cond…

999953c

…ition

fix: cleanup subscriptions and resubscribe on upstream connection close

a0774c2

feat: sub metrics via LRU cache

fb6e73e

feat: clients also return subscription count

dae6cc4

chore: warn if LRU cache count is not matching pubsub count

7ecfc4a

chore: fix unsub on already evicted + metric counts

533ccde

chore: log discrepant account

638622d

chore: don't remove pubkey from LRU cache if unsub fails

f6a0270

chore: only removing sub when unsubscribe completed

402e0b7

chore: improve subs logging

8d31a09

chore: simplify unsub and remove invalid resub

754663f

chore: eliminate sub/unsub race condition

7943a7d

chore: prevent overwriting existing sub

361b0da

chore: tracing fetch + subs for cleaner debug logs

33430a3

chore: minor comments

98351e7

chore: update correct metric + log on info for now

56e05f5

chore: add metrics query to test context

399c37d

chore: add manual ix test to diagnose subscriptions

d12b74f

chore: merge bmuddha/fix/ws-reconnects, adjusting the changes

0d638f4

chore: adding warn logs when recverr occurs

5ec5287

chore: fix max log level override

e64166c

chore: more robust handling of fetch failure

07b170e

chore: fix recycle connections deadlock

0c7a7cd

chore: clippy

21f91d1

fix: the extra task was overkill and not awaited

cf69692

chore: log delegation issues on debug

c193be6

fix: rely on cancellation tokens, remove join_set to fix endles recyc…

9dce1a9

…le loop

chore: recycle with backoff

9f0a972

chore: try to resub before recycle

7d635c4

chore: pubsub client includes client id in all logs

bf366ae

thlorenz added 2 commits November 10, 2025 10:47

chore: adding metrics for account fetches

765c31f

chore: log more info for mincontext slot fetch failure

7490687

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

chore: fix metrics suffixes

0279b24

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

bmuddha and others added 4 commits November 10, 2025 14:52

fix(metrics): drain the connection of request body

5ebf091

fix: found vs not-found metric counting

d9d5ee5

fix: use more realistic slot for program deploy

77c33e1

chore: blacklist native token program

2cea109

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

thlorenz added 3 commits November 13, 2025 10:19

Merge branch 'master' into thlorenz/subscription-metrics

c0e98fb

* master: fix: fetching CUs for metrics (#623)

chore: enable ledger size metric

9e9e14a

chore: fix clippy

917f177

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

thlorenz added 2 commits November 13, 2025 15:48

chore: more logs around undelegate requests

1f2d684

chore: verifying completed undelegation in intent tests

0b1b245

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

feat: add undelegation requested/completed metrics

7880a7d

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

fix: move delete onto separate thread

9f8a0b2

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

thlorenz and others added 2 commits November 13, 2025 20:41

chore: fix conditition to inc delegation complete

021ef0d

fix: use the latest SVM with gasless feepayer check

a8900e1

coderabbitai bot reviewed Nov 14, 2025

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

Luca Cillario and others added 6 commits November 14, 2025 12:31

ci: trigger synchronize

0a042a3

fix: check for privileged mode when filtering empty accounts

8f2e12b

chore: fix lint

82f4248

chore: add extra check that unescrowed payer cannot pay/write for tx

ae73b6d

Merge branch 'master' into thlorenz/subscription-metrics

e1e6b15

* master: fix: move delete onto separate thread (#629) feat: return ledger + accountsdb metrics (#624)

Revert "fix: use the latest SVM with gasless feepayer check"

37d6839

This reverts commit a8900e1.

		const NATIVE_TOKEN_PROGRAM_ID: Pubkey =
		solana_sdk::pubkey!("TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA");

feat: enhance subscription management with metrics and reconnection #621

Are you sure you want to change the base?

feat: enhance subscription management with metrics and reconnection #621

Uh oh!

Conversation

thlorenz commented Nov 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Details

Subscription Management Improvements

Metrics and Monitoring

Reconnection and Reliability

Testing and Integration

Code Quality

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

thlorenz commented Nov 6, 2025 •

edited by coderabbitai bot

Loading